This notebook includes preliminary attempts to visualize some basic information about the dataset.
#df <- read.csv('../../Data/indonesia_indicators_time.csv')
This section details the messiness of our dataset. First, we took a quick look at a few ways that items have been disaggregated.
When we initially made unique measures seperate from one another, we concatenated all of the columns in the dataset having to do iwth disaggregation. Based on a cursory look, these are some of the breakdowns (note that these categories may not be complete). When we could identify that everyone appeared to be included (e.g., ‘ALLREGIONS’ or ‘BOTHSEX’), we did not count these measures as ‘disaggregated.’
## `summarise()` has grouped output by 'Target', 'Indicator', 'SeriesDescription'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'disaggregation'. You can override using the `.groups` argument.
## Warning: 'hctreemap' is deprecated.
## Use 'data_to_hierarchical' instead.
## See help("Deprecated")
This is a bit more of a look at the above disaggregation, wherein we look also at whether measures are disaggregated and how many (per target, subset by goal)
Finally, the following is an example of our current progress (with Indonesia) in terms of how many indicators we have removed for each target / goal.
processedIndo = read.csv('~/QMSS/G5055_Practicum_Project2/Data/processedIndo.csv')
nrow(processedIndo)
## [1] 4230
processedIndo_No_Disagg = read.csv('~/QMSS/G5055_Practicum_Project2/Data/processedIndo-WITHOUT disaggregation.csv')
nrow(processedIndo_No_Disagg)
## [1] 1809
Also wanted to look at the same with guatemala
## `summarise()` has grouped output by 'Target', 'Indicator', 'SeriesDescription'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'disaggregation'. You can override using the `.groups` argument.
## $tm
## disaggregation_count vSize vColor stdErr vColorValue level x0 y0 w h color
## 1 age_sex\n372 372 1 372 NA 1 0.4494949 0.2370031 0.41039655 0.7629969 #FCFBFD
## 2 geographic_region\n99 99 1 99 NA 1 0.4494949 0.0000000 0.35161290 0.2370031 #EFEDF5
## 3 other/not_disaggregated\n534 534 1 534 NA 1 0.0000000 0.0000000 0.44949495 1.0000000 #DADAEB
## 4 raw_material\n127 127 1 127 NA 1 0.8598915 0.2370031 0.14010850 0.7629969 #BCBDDC
## 5 sector\n42 42 1 42 NA 1 0.8011079 0.0000000 0.14916911 0.2370031 #9E9AC8
## 6 time\n14 14 1 14 NA 1 0.9502770 0.0000000 0.04972304 0.2370031 #807DBA
##
## $type
## [1] "index"
##
## $vSize
## [1] "count"
##
## $vColor
## [1] NA
##
## $stdErr
## [1] "count"
##
## $algorithm
## [1] "pivotSize"
##
## $vpCoorX
## [1] 0.02812148 0.97187852
##
## $vpCoorY
## [1] 0.01968504 0.91031496
##
## $aspRatio
## [1] 1.483512
##
## $range
## [1] NA
##
## $mapping
## [1] NA NA NA
##
## $draw
## [1] TRUE
## Warning: 'hctreemap' is deprecated.
## Use 'data_to_hierarchical' instead.
## See help("Deprecated")
This is a bit more of a look at the above disaggregation, wherein we look also at whether measures are disaggregated and how many (per target, subset by goal)
## [1] "Missingness Across Time:"
## Weighted degree of each measure TBD